Identification of protein coding regions in the human genome by quadratic discriminant analysis.

نویسنده

  • M Q Zhang
چکیده

A new method for predicting internal coding exons in genomic DNA sequences has been developed. This method is based on a prediction algorithm that uses the quadratic discriminant function for multivariate statistical pattern recognition. Substantial improvements have been made (with only 9 discriminant variables) when compared with existing methods: HEXON [Solovyev, V. V., Salamov, A. A. & Lawrence, C. B. (1994) Nucleic Acids Res. 22, 5156-5163] (based on linear discriminant analysis) and GRAIL2 [Uberbacher, E. C. & Mural, R. J. (1991) Proc. Natl. Acad. Sci. USA 88, 11261-11265] (based on neural networks). A computer program called MZEF is freely available to the genome community and allows users to adjust prior probability and to output alternative overlapping exons.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Long non-coding RNAs and their significance in human diseases

Protein-coding genes account for only a small fraction of the human genome and most of the genomic sequences are transcriptionally silent, but recent observations indicate significant functional elements, including non-coding protein transcripts in the human genome. Long non-coding RNAs (lncRNAs) have been defined as transcripts of >200 nucleotides without protein-coding capacity that perform t...

متن کامل

Phylogenetic Analysis of Three Long Non-coding RNA Genes: AK082072, AK043754 and AK082467

Now, it is clear that protein is just one of the most functional products produced by the eukaryotic genome. Indeed, a major part of the human genome is transcribed to non-coding sequences than to the coding sequence of the protein. In this study, we selected three long non-coding RNAs namely AK082072, AK043754 and AK082467 which show brain expression and local region conservation among vertebr...

متن کامل

The Validation of the Thermal Regions in Iran with an Emphasis on the Identification of the Climatic Cycles

Background: The present study aimed to validate the thermal regions in Iran with an emphasis on the identification of the climatic cycles during the recent half-century. Methods: Data on daily temperature were extracted for 383 synoptic stations of Iran Meteorological Organization. For the zoning of the temperatures of Iran, multivariate statistical techniques (cluster analysis and discriminan...

متن کامل

P-85: How a Frame Shift Caused by a Single Base Deletion In SEPT12 Gene Shed Lights As a Polymorphism

Background: Septins are members of highly conserved polymerizing GTP binding proteins well described in the animal kingdom. 14 Septin proteins have been characterized in humans (SEPT1-SEPT14), some of which are tissue-specific. All of 14 genome-mapped human septins contain a highly conserved central GTP-binding domain which is very critical in GTPase signaling properties as well as oligomerizat...

متن کامل

The Gene-Finder Computer Tools for Analysis of Human and Model Organisms Genome Sequences

We present a complex of new programs for promoter, 3'-processing, splice sites, coding exons and gene structure identification in genomic DNA of several model species. The human gene structure prediction program FGENEH, exon prediction-FEXH and splice site prediction-HSPL have been modified for sequence analysis of Drosophila (FGENED, FEXD and DSPL), C.elegance (FGENEN, FEXN and NSPL), Yeast (F...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Proceedings of the National Academy of Sciences of the United States of America

دوره 94 2  شماره 

صفحات  -

تاریخ انتشار 1997